Vaccination is one of the most effective public health interventions to prevent infectious diseases and reduce mortality and morbidity. However, there are still significant gaps and inequalities in vaccination coverage across countries and regions, which pose challenges for achieving global health goals and ensuring equity. Understanding the patterns and drivers of vaccination coverage among countries can help inform policy decisions and interventions to improve immunization programs and outcomes.

In this project, we propose a data-driven approach to analyze global vaccination coverage data using time-series clustering techniques. Time-series clustering is a method of grouping time series data based on their similarity in shape, trend, or variability. By applying time-series clustering to vaccination coverage data, we aim to identify clusters of countries that have similar or dissimilar vaccination trends over time, and explore the characteristics and factors that explain the differences or similarities among the clusters.

We will use the WHO vaccination data from 2021 as our main data source, which contains information on 15 vaccines for 194 countries. https://data.unicef.org/wp-content/uploads/2016/07/wuenic2021rev_web-update.xlsx

Vaccine Explanation Age of administration Value for analysis
BCG Bacille Calmette-Guerin vaccine for tuberculosis (TB) disease At birth or as soon as possible after birth Can indicate the TB burden and the quality of maternal and child health services
DTP1 First dose of diphtheria, tetanus, and pertussis vaccine At 6 weeks of age Can indicate the timeliness of immunization initiation
DTP3 Third dose of diphtheria, tetanus, and pertussis vaccine At 14 weeks of age Can reflect the performance and equity of immunization programs
HEPBB Hepatitis B vaccine at birth Within 24 hours of birth Can prevent perinatal transmission of hepatitis B virus and chronic infection
HIB3 Third dose of Haemophilus influenzae type b vaccine At 14 weeks of age Can indicate the impact of Hib vaccination on meningitis and pneumonia burden and mortality
IPV1 First dose of inactivated poliovirus vaccine At 14 weeks of age Can indicate the progress toward polio eradication and the switch from oral to inactivated polio vaccine
MCV1 First dose of measles-containing vaccine. Measles is a viral infection that can cause fever, rash, and complications such as pneumonia and encephalitis. At 9 months of age Can indicate the progress toward measles elimination
MCV2 Second dose of measles-containing vaccine. Measles is a viral infection that can cause fever, rash, and complications such as pneumonia and encephalitis. At 15-18 months of age or at school entry (4-6 years) depending on the country’s schedule Can indicate the quality and sustainability of measles immunization programs
PCV3 Third dose of pneumococcal conjugate vaccine. Pneumococcal disease is caused by bacteria that can cause pneumonia, meningitis, sepsis, and otitis media. At 14 weeks of age Can indicate the impact of PCV vaccination on pneumococcal disease burden and mortality
POL3 Third dose of oral poliovirus vaccine. Polio is a viral infection that can cause paralysis and death. At 14 weeks of age Can indicate the progress toward polio eradication and the switch from oral to inactivated polio vaccine
RCV1 First dose of rubella-containing vaccine. Rubella is a viral infection that can cause fever, rash, and congenital rubella syndrome (CRS) in pregnant women and their babies. At 9-9.5 months of age Can indicate the progress toward rubella and CRS elimination
ROTAC Rotavirus vaccine. Rotavirus is a viral infection that can cause severe diarrhea, dehydration, and death in children. At 6 and 10 weeks of age Can indicate the impact of rotavirus vaccination on diarrheal disease burden and mortality
YFV Yellow fever vaccine. Yellow fever is a viral infection that can cause fever, jaundice, bleeding, organ failure, and death. It is transmitted by mosquitoes in tropical regions. At 9 months of age or older for people living in or traveling to areas at risk for yellow fever transmission. A single dose provides lifelong protection for most people. Booster doses may be required for some travelers. Can indicate the risk of yellow fever outbreaks and the need for preventive measures such as mosquito control and vaccination campaigns
library(readxl)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.0     ✔ readr     2.1.4
#> ✔ forcats   1.0.0     ✔ stringr   1.5.0
#> ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
#> ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
#> ✔ purrr     1.0.1     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(plotly)
#> 
#> Attaching package: 'plotly'
#> 
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> 
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> 
#> The following object is masked from 'package:graphics':
#> 
#>     layout

Loading the data

path <- "wuenic2021rev_web-update.xlsx"
sheet_names <- excel_sheets("wuenic2021rev_web-update.xlsx")
data_list <- list()
for (i in seq_along(sheet_names)) {
  data_list[[i]] <- read_excel(path, sheet = sheet_names[i])}
agg_data <- data_list[[15]] 
data_list <- data_list[1:14]
data_combined <- do.call(rbind, data_list)

Making the data in long formate tidy

data_combined <- pivot_longer(data_combined, cols = c(5:ncol(data_combined)), names_to = "year", values_to = "coverage")
data_combined$year <- as.Date(paste0(data_combined$year, "-01-01"))

Discovering the global trend for each vaccine

global_data <- agg_data %>%
  filter(region == "Global")
p <- ggplot(global_data, aes(x = year, y = coverage, color = vaccine)) +
  geom_line() +
  theme_bw()
ggplotly(p)

Plotting the top and bottom countries by median coverage

n_countries <- 8
median_coverage <- data_combined %>%
  group_by(country) %>%
  summarize(median_coverage = median(coverage, na.rm = TRUE)) %>%
  arrange(desc(median_coverage))
top_countries <- head(median_coverage$country, n_countries / 2)
bottom_countries <- tail(median_coverage$country, n_countries / 2)
selected_countries <- c(top_countries, bottom_countries)

DTP3 coverage over time for the selected countries

data_selected <- data_combined %>%
  filter(country %in% selected_countries & vaccine == "DTP3")
g <- ggplot(data_selected, aes(x = year, y = coverage, color = country)) +
  geom_line() +
  theme_bw()
ggplotly(g)

Plotting barchart for the vaccine coverage in 2021

data_2021 <- data_combined %>%
  filter(year == "2021-01-01")
total_coverage <- data_2021 %>%
  group_by(country) %>%
  summarize(total_coverage = sum(coverage)) %>%
  arrange(total_coverage)
data_2021$country <- factor(data_2021$country, levels = total_coverage$country)

g <- ggplot(data_2021, aes(x = country, y = coverage, fill = vaccine)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 2))
ggplotly(g)
#> Warning: Removed 16 rows containing missing values (`position_stack()`).

Checking Countries with Conflict

war_countries <- c("AFG", "IRQ", "SDN", "THA", "PAK", "MEX", "NGA", "SYR", "YEM")
data_war_countries <- data_combined %>%
  filter(iso3 %in% war_countries & vaccine == "DTP3")

Plotting the coverage of DTP3 for these countries

g <- ggplot(data_war_countries, aes(x = year, y = coverage, color = country)) +
  geom_line() +
  theme_bw()
ggplotly(g)

Plotting using facet

g <- ggplot(data_war_countries, aes(x = year, y = coverage)) +
  geom_line() +
  facet_wrap(~country, ncol = 3) +
  theme_bw()
ggplotly(g)